Airbnb Data Analytics Project - Shiny App

Marvin Martin & Aflak Michel Omar (ING5 BDA Gr01A)

26/11/2020

Introduction

Data Extraction

Airbnb daily data is very valuable. An investor is eager to use this data to make key decisions about the best real estate option available to generate benefit. In this project, we will use scrapped data from 6 countries aggregated over a period of time (For each city of these countries we kept the 3 latest dates collected):

These datasets can be download on this website http://insideairbnb.com/get-the-data.html. A csv documents is available in data\all_data_urls.csv where all the scrapped urls are available and ready to download .

Prepocessing overview

Because these dataset are huge, we made some processing to focus on important information and by the way use a reasonable amount of data (fit computation and time limitations). We been throw several steps:

################### Code From utils/tools.R ##################################
urls <- read.csv(file.path("./data/all_data_urls.csv")) # Step 1 
df <- extract_all_meta(urls) # Step 2 
lastest_dates <- 3 # Step 3
countries <- c("france", "spain", "the-netherlands", "germany", "belgium","italy") # Step 4
download_data(df, countries, lastest_dates) # Step 5
listings <- load_global_listings() # Step 6

We reduce the data size from several Gb to only hundred of Mo. We are now ready to play with it!

Prepocessing Steps

Starting with raw data, we been throw several steps:

[Step 1] Load csv data with urls and meta provided (read.csv)
[Step 2] Extract “country”, “region”, “city”, “date” and “url” from the csv in a dataframe (extract_all_meta)
[Step 3] Specify the number “n” of latest scrapping date you are looking for.
[Step 4] Select a list of 6 countries, on which you want to work on.
[Step 5] Go through this dataframe, line by line and do the following steps (download_data and prepare_data) :

[Step 5 - Remarque] This big step results in a csv file for every cities of the countries listed.
We could have avoid writing files, but as long as this step takes more thanks 10 minutes, we preferred to keep then in our memory.

[Step 6] Get Final preprocess dataset by merging all the cities csv into a single data frame (load_global_listings).
This step is used at the beginning of the server and takes around 20 seconds.

Data Overview

Dataset sample:

Here is the shape of our dataset:
# Publications :1321825
# Features :21

Features names are:

## - id
## - country
## - region
## - city
## - date
## - neighbourhood_cleansed
## - latitude
## - longitude
## - property_type
## - room_type
## - accommodates
## - bedrooms
## - beds
## - price
## - minimum_nights
## - maximum_nights
## - review_scores_rating
## - availability_30
## - price_30
## - revenue_30
## - latitudelongitude

Shiny App: Tab1 - Analysis by comparing several cities

Tab 1 - Analysis by comparing several cities

Shiny App: Tab2 - Analysis only one city

Tab 2 - Analysis only one city

Shiny App: Structure and code overview

Libraries

We use several libraries (webapp, graphical, data manipulations) to build this project:
shiny, googleVis, ggplot2, dplyr, data.table, stringr and glue

UI

################### Code From shinyApp/ui.R ##################################
# IT IS SPEUDO CODE !!!
fluidPage
  tabsetPanel
    tabPanel # Analysis 1 Tab
      sidebarLayout
        sidebarPanel # Tool Bar
          Checkbox, selectInput, uiOutput, ...
        mainPanel # Plots
          htmlOutput, plotOutput ...
    tabPanel # Analysis 2 Tab
      sidebarLayout
        sidebarPanel # Tool Bar
          Checkbox, selectInput, uiOutput, ...
        mainPanel # Plots
          htmlOutput, plotOutput ...

Server

################### Code From shinyApp/server.R ##################################
# IT IS SPEUDO CODE !!!
listings <- load_global_listings() # Download data
# Server
server
  # Tab 1 variables
  reactive # Reactive DataFrame (filter by country / cities / features)
  renderUI  # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
  renderGvis, renderPlot # Plots send to ui send from server to htmlOutput,plotOutput (histogram,...)
  # Tab 2 variables
  reactive # Reactive DataFrame (filter one city)
  renderUI  # ui send from server to uiOutput (checkbox, selectInput, dateSlider)
  renderGvis, renderPlot # Plots  send to ui from server to htmlOutput,plotOutput (map,...)

App Usage: Analysis 1 Comparing Cities

Each tab is splited into to vertical part: Tool Bar and Plots

Tool Bar

You can:

Plots

App Usage: Analysis 2 Deep Dive in one City

Tool Bar

You can:

Plots

Let’s try this in our Shiny App

library(shiny) # You might need to install more packages (ggplot, gvis, ...)

setwd("~/YOUR_PATH/Airbnb-Analysis-ShinyApp")
runApp(appDir = "shinyApp")

#or 
setwd("~/YOUR_PATH/Airbnb-Analysis-ShinyApp")
runGitHub("Airbnb-Analysis-ShinyApp", "MarvinMartin24", subdir = "shinyApp")